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ABSTRACT 


The field of non-coding RNA biology has been ham- 
pered by the lack of availability of a comprehen- 
sive, up-to-date collection of accessioned RNA se- 
quences. Here we present the first release of RNA- 
central, a database that collates and integrates in- 
formation from an international consortium of estab- 
lished RNA sequence databases. The initial release 
contains over 8.1 million sequences, including rep- 
resentatives of all major functional classes. A web 
portal (http://rnacentral.org) provides free access to 
data, search functionality, cross-references, source 
code and an integrated genome browser for selected 
species. 


INTRODUCTION 


In recent years, there has been a tremendous growth in the 
number of reported sequences of non-coding RNAs (ncR- 
NAs). Large-scale genome sequencing has identified new 
representatives of well-known functional classes, but addi- 
tionally, many new types of ncRNA have been reported, in- 
cluding piR NAs (1) and circRNAs (2). However, informa- 
tion about such sequences is often ‘locked up’ in the sup- 
plementary materials associated with publications, or may 
be referenced only through the chromosomal location of 
the encoding gene, making it cumbersome for biologists 
and bioinformaticians to extract the relevant data. To ad- 
dress this problem, specialist databases have been created 
for many types of ncRNAs to extract and abstract this in- 
formation and to present it in a coordinated fashion on the 
web. Examples include miR Base (3), gtRNAdb (4), Rfam 
(5) and NONCODE (6). Additionally, for certain model 
species, there are specialist genome-centric databases that 
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include ncRNAs within their scope; for example, the Sac- 
charomyces Genome Database (SGD) (7) contains infor- 
mation about all ncRNA genes in the budding yeast Sac- 
charomyces cerevisiae. 

At present, some tools that researchers take for granted 
when analyzing protein sequences are not available for ncR- 
NAs. For example, it has not been possible to carry out a 
sequence search of an individual ncRNA against all known 
ncRNAs due to the lack of a collection of ncRNAs. The 
equivalent operation has been fundamental for the advance 
of protein science. Identifying the full complement of ncR- 
NAs for a particular species is also not possible, except 
for a few model organisms that have been intensively stud- 
ied. Bringing all known ncRNA sequences into a common 
database would also enable the identification of sequences 
that are shared between resources and those that are only 
found uniquely in one resource. These comparisons should 
provide opportunities for linking between RNA informa- 
tion resources as well as providing quality control between 
different sources of ncR NA sequence. 

The need for a comprehensive ncR NA sequence database 
was identified at a meeting of RNA researchers at Hinx- 
ton in 2010 (8), which highlighted the rapid growth in both 
ncRNA sequence and functional information. It was pro- 
posed that such a resource should utilize the expert com- 
munity of RNA researchers through incorporation of data 
from the numerous ncRNA databases already in existence. 
To address these needs, and accelerate RNA research, we 
have developed RNAcentral, which aggregates information 
from a federation of ncRNA sequence databases. RNAcen- 
tral combines these resources to provide a comprehensive 
and consistent collection of accessioned ncRNA sequences. 
In addition, RNAcentral acts as a hub that allows users to 
navigate from RNAcentral back to the source of the RNA 
sequences. In the future, we plan to develop RNAcentral 
further to incorporate additional datatypes and informa- 
tion about RNA structure, sequence modifications, RNA- 
RNA and RNA-protein interactions, and function. 


RNAcentral Expert Databases 


Databases that contribute sequence data to RNAcentral 
are known as Expert Databases. Ten such databases (3- 
5,9-16) have contributed to the current release (see Ta- 
ble 1 for details). The number of sequences contributed by 
each database and the level of quality assurance each offers 
varies: the European Nucleotide Archive (ENA), for exam- 
ple, contributes over 6.5 million sequences to RNAcentral, 
for which some have received manual attention, but oth- 
ers have been generated through unsupervised automated 
annotation processes. IncR NAdb (9) contributes just 62 se- 
quences of long non-coding RNAs (IncRNAs), all of which 
are annotated with detailed information and references are 
provided for each. Thus, RNAcentral provides broad cover- 
age of RNA sequence, while including rich and high quality 
annotation for a subset of sequences. We are currently in the 
process of incorporating further Expert Databases, and wel- 
come contact from any ncR NA databases that would like to 
be included. 


RNA scientific community 
3| 3 | 
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Figure 1. RNAcentral architecture. Block arrows indicate data flow, while 
solid arrows indicate web traffic and quality control. 


DATA FLOW 
Architecture of RNAcentral 


RNAcentral has been implemented with the goal of making 
the resource sustainable through reuse of, and extensions to, 
existing bioinformatics infrastructure. In particular, core se- 
quence data flow (including data submission/capture, vali- 
dation, storage and retrieval services), cross-reference main- 
tenance and sequence similarity search, are provided using 
services from the ENA (10), an established data resource 
that provides generic sequence archiving for the scientific 
community. The overall architecture and data flows into 
RNAcentral are illustrated in Figure 1. 

Content of relevance to RNAcentral is archived di- 
rectly as part of ENA in one of three ways. First, Expert 
Database staff report their ncRNA sequence data holdings 
into RNAcentral, using an ENA submission process, with 
assistance as appropriate from RNAcentral staff. Second, 
general ENA users who deposit sequence data for public 
dissemination provide ncRNA (and other) data through 
ENA submission services. Third, data reported into global 
partner databases of the International Nucleotide Sequence 
Database Collaboration (INSDC; (17)), the global nu- 
cleotide sequence database of records are mirrored to ENA 
through automatic processes that operate on a nightly basis. 
Specialist services and data presentations have been engi- 
neered under RNAcentral to support the selection and rout- 
ing of ncRNA sequence data from ENA into RNAcentral- 
specific data flows. 


Unique RNA sequence identifiers 


A major roadblock for the field of RNA biology is the lack 
of a set of consistent and stable accessions for RNA se- 
quences. The goal of the current stage of the project is to 
catalog all known ncRNA sequences. To achieve this, RNA- 
central assigns Unique RNA Sequence ids (URS) to distinct 
RNA sequences, no matter which species they are from. 
This approach parallels that of the UniProt Archive (Uni- 
Parc) database (18). The benefits of this design choice are 
that the mapping from an identifier to an exact sequence 
is unique and will not change over time. In addition, the 
design allows a rapid look up of new sequences to check 
whether they already exist in RNAcentral. One downside 
of this design is that it creates many identifiers for sets of 
closely related sequences. We will address this issue in fu- 
ture releases, as described below. 
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Table 1. Expert Databases from which sequence data are already incorporated into RNAcentral 


Database name Description URL 


ENA European Nucleotide Archive; provides the complete set of ncRNA sequence 
data reported by the scientific community to the databases of the International 
Nucleotide Sequence Database Collaboration (INSDC; (17)) as part of 
conventional scientific best practice (10). 

Rfam Database of ncRNA families and cis-regulatory elements with a broad 
taxonomic coverage (5). 


http://www.ebi.ac.uk/ena/ 


http://rfam.xfam.org/ 


RefSeq A comprehensive, non-redundant, well-annotated set of reference sequences http://www.ncbi.nlm.nih.gov/ 
including genes and transcripts (11). refseq/ 

VEGA Database of vertebrate gene annotation that provides a high-quality set of http://vega.sanger.ac.uk 
IncRNAs produced by manual annotation (12,13). 

gtRNAdb Contains tRNA gene predictions on complete or nearly complete genomes http://gtrnadb.ucsc.edu/ 
from a broad range of species (4). 

miR Base miR Base is a database of published microRNA sequences and annotations (3). http://mirbase.org/ 

RDP Provides a quality-controlled, aligned and annotated set of small ribosomal http://rdp.cme.msu.edu/ 


subunit RNA sequences (14). 


tmRNA Website Provides information about the tmRNA molecule found in bacteria and some http://bioinformatics.sandia. 
organelles (15). gov/tmrna/ 
SRPDB Provides information about the signal recognition particle RNA molecule (16). http://rmp.uthscsa.edu/rnp/ 
SRPDB/SRPDB.html 
IncRNAdb Provides comprehensive information about experimentally characterized http://www. |ncrnadb.org/ 


IncRNA molecules (9). 


The Unique RNA Sequence identifiers have the following 
format: URS + a sequentially assigned 10-digit hexadecimal 
number (e.g. URS00000478B7). The naming scheme can ac- 
commodate more than one trillion sequences (16!). Once 
created, the URS ids cannot be modified, deleted or re- 
associated with a different RNA sequence. Each URS iden- 
tifier is uniquely associated with a checksum computed on 
the uppercase DNA version of the sequence using the MD5 
algorithm described in RFC 1321 (http://www.ietf.org/rfe/ 
rfc1321.txt). These checksum values support fast lookup of 
identical sequences via the RNAcentral user interfaces. 


Keeping track of cross-references 


Every Unique RNA Sequence is associated with one or 
more cross-references (xrefs) pointing to the correspond- 
ing entries in the Expert Databases (e.g. the sequence 
URS00000478B7 is a human SRP RNA found in the SR- 
PDB, Rfam, RefSeq and IncRNAdb databases). A cross- 
reference tracking system associates the Unique RNAcen- 
tral Sequence identifiers with the accessions used by the 
Expert Databases. During each RNAcentral release, cross- 
references can be added, kept active or deactivated (when 
the sequence is no longer present in the Expert Database). 


Quality control 


One of the most important functions of RNAcentral is 
to provide quality control of the incoming data. We work 
closely with the Expert Databases to ensure that all data 
are self-consistent and meet the INSDC standards. We also 
examine the existing INSDC data to discover entries inap- 
propriate for RNAcentral. For example, all ncRNA features 
defined using the order location operator were filtered out 
because the sequences of such entries do not represent con- 
tiguous sequences. 

In addition, several ‘common sense’ rules for excluding 
sequences from RNAcentral have been implemented: 


e Sequences that are shorter than 10 nucleotides are not 
included because they are not likely to represent biologi- 
cally relevant ncRNAs (for a comprehensive list of ncR- 
NAs and their sizes, the reader is referred to a recent 
review (19)). This cutoff is currently applied to all se- 
quences, but in the future we may develop different cut- 
offs for different RNA types. 

e The sequences in INSDC may include ‘N’ characters to 
indicate that the identity of some residues has not been 
established. While such sequences are allowed in RNA- 
central, entries where ‘N’ residues constitute more than 
10% of the sequence length are filtered out. This proce- 
dure excludes ~0.1% of candidate sequences and about 
5% of sequences with at least 1 ‘N’. As a result, in this 
release (version 1.0), 374 705 sequences contain ‘N’ char- 
acters, half of which have only one unknown residue. 


The collection of RNA annotations in one centralized lo- 
cation also allows for cross-database quality control mea- 
sures that were not previously possible (see also Discussion 
section). For example, 21 microRNA sequences deposited 
by miRBase are simultaneously annotated as other RNA 
types by different Expert Databases. These sequences have 
been flagged for the attention of miR Base and those Expert 
Databases. Similarly, a number of sequences simultaneously 
annotated with multiple related Rfam families were identi- 
fied. This has been brought to the attention of the Rfam 
team, and the affected RNA families will be imported in 
RNAcentral once the problem is resolved. 


SUBMITTING DATA TO RNACENTRAL 


We encourage all RNA biologists who publish the identi- 
fication of novel ncRNA sequences to ensure that they are 
submitted into one of the INSDC databases. New ncRNA 
sequences submitted to INSDC are automatically imported 
in RNAcentral, once the data satisfy the quality control 
criteria described above. Reasonable assistance can be pro- 
vided to Expert Databases wishing to submit annotations of 
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existing INSDC sequences. In rare cases when the data can- 
not be submitted to INSDC, the data may still be imported 
into RNAcentral as long as the sequences can be mapped 
to primary INSDC accessions (e.g. contigs in a genome as- 
sembly). The contact form on the RNAcentral website can 
be used to get in touch with the RNAcentral team regarding 
data submission. 


RNACENTRAL WEBSITE 
Website features 


The RNAcentral website is available at http://rnacentral.org 
and enables several ways to access data. Firstly, a text search 
for keywords and other metadata is provided. The results of 
such searches are faceted such that the results can be filtered 
further. For example, the user can search for all human ncR- 
NAs, and then easily filter the results to select all rRNAs. 
Facets are provided for Expert Database, RNA type and 
species. The RNAcentral website is also equipped with a se- 
quence search interface powered by the ENA services where 
the user can carry out a similarity search of a query se- 
quence against all ncR NAs found in RNAcentral. Finally 
all the data can be also accessed programmatically using 
the REST API and the FTP archive (http://rnacentral.org/ 
downloads). 


Genome mapping 


In order to put ncRNA sequences in their genomic context, 
it is important to map the sequences onto their genomic lo- 
cations. For example, snoR NAs that are transcribed within 
the introns of protein-coding genes become readily appar- 
ent when viewed in a genome browser. Knowing genomic 
locations also enables integration with genome browsers 
and other bioinformatic resources that use genome coordi- 
nates for annotating ncRNAs. 

Since all reference genomes are defined using INSDC- 
accessioned sequences and all RNAcentral sequences are 
based on primary INSDC accessions, it is possible to estab- 
lish a mapping between the RNAcentral entries and their 
genomic coordinates in reference genomes. 

The Ensembl Perl API (20) is used to map the low-level 
INSDC accessions to their top-level genomic coordinates 
(such as chromosomes or contigs) for a number of key 
species, including human, mouse, yeast, fruit fly, worm, 
thale cress and others (the full list of supported species 
is available at the RNAcentral website). Notably, all hu- 
man entries are mapped to the new human genome assem- 
bly, GRCh38, including the miRBase and VEGA Expert 
Database datasets. The genomic coordinates of the RNA- 
central entries can be downloaded in a variety of formats 
from the FTP site or through the REST API. 

Whenever genomic mapping is available, RNAcentral se- 
quences can be viewed in their genomic context using a 
light-weight genome browser (http://genoverse.org) where 
the users can interactively explore the genomic neighbor- 
hood without leaving the page (see Figure 2). External links 
are provided to the fully-featured genome browsers such as 
Ensembl (20) and the UCSC genome browser (21). 


Overview of the data 


The current release 1.0 of RNAcentral contains over 8.1 
million unique sequences. The sequences in RNAcentral are 
very biased toward ribosomal RNAs (70% of all sequences) 
that are used in environmental sampling to identify species. 
The class of tRNAs account for a further 10% of RNAcen- 
tral sequences. We can also look at the distribution of RNA- 
central sequences across species, shown in Figure 3. Bacte- 
rial sequences account for about half of RNAcentral, while 
eukaryotes account for about 40% of the sequences. While 
there are far fewer eukaryotic genomes available, each has a 
larger number of RNAs. Vertebrates currently account for 
about one third of all eukaryotic RNAcentral sequences. In 
this section, we will illustrate RNAcentral data using three 
model organism examples. 

The reference S. cerevisiae strain S288C (taxonomic iden- 
tifier taxid:559292) contains 238 RNA sequences according 
to RNAcentral. SGD, the yeast model organism database, 
identifies 424 RNA genes leading to 191 unique sequences. 
In budding yeast tRNA sequences are duplicated many 
times. Twenty-one tRNA sequences are found twice in 
the genome and 13 sequences have more than 10 iden- 
tical copies. There are also two complete copies of the 
rDNA repeats included in the reference genome. Of the 
191 unique sequences in SGD, we can assign 163 (85%) 
as being identical to an RNAcentral sequence. Seventy-five 
sequences are found to be unique to RNAcentral and 28 
sequences are unique to SGD. Of the 28 sequences not 
found in RNAcentral, all but one are encoded by the mi- 
tochondrial genome, 24 tRNAs, two rRNAs and an un- 
classified ncRNA sequence. The reference genome of bud- 
ding yeast, strain S288C, was changed from taxid:4932 to 
the more specific taxid:559292 2 years ago. The source of 
the yeast mitochondrial RNAs has apparently not been up- 
dated to taxid:559292 and thus these mitochondrially en- 
coded tRNAs are not associated with the proper taxid. The 
remaining sequence unique to SGD is SNR17A, an intron- 
containing gene. The intron containing form of SNR17A is 
presentin RNAcentral, but not the intronless form. Twenty- 
seven of the 75 RNAcentral unique sequences contain the 
gene’s intron sequence and thus do not represent the mature 
form of the RNA. As many as 47 of the RNAcentral unique 
sequences are likely to be due to partial matches by Rfam 
families creating new unique sequences. In all cases, the full- 
length sequence identical to SGD also exists. For example, 
the snoRNA snR45 from SGD is 172 nucleotides long and 
can be found in URS00000284F1, while Rfam provides a 
171 nucleotide sequence (URS00006C1 FAA) that lacks the 
final uracil. 

A search for human ncRNAs in RNAcentral using 
the taxonomic identifier (taxid:9606) identifies 75 931 se- 
quences, which exceeds the number one expects. This num- 
ber includes 32 668 miscRNAs, 21 756 IncRNAs, 5139 mi- 
croRNAs, 4042 rRNAs, etc. The miscRNA category con- 
tains a large number of piRNAs that have not been given 
the correct type by submitting authors. There appear to be 
twice as many microRNAs as expected (miR Base annotates 
~2500 mature microRNA sequences). The inflated num- 
ber is due to many factors including multiple different se- 
quenced versions of human DNA, which leads to multiple 
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organism, expert database, gene, ncRNA type, accession 
Examples: RNA, Homo sapiens, miRBase, HOTAIR, Escherichia" 


(v1.0 | Expert databases~ API~ Sequence search Downloads Help Contact 


Unique RNA Sequence URS00003A022F © Interactivo tour 


@ A unique RNA sequence entry in RNAcentral groups together all identical RNA sequences no matter what species they are from. 


Overview Taxonomy 2D 3D + Download ~ 


Overview A 


Description: Homo sapiens IncRNA 
448 nucleotides 3 databases (ENA, RefSeq, VEGA) 1 organism first seen 29 May 2014 last updated 05 Sep 2014 


Annotations 1-2 of2 (2) Fiber tte 


Database Description Species 
(* ] Vega Homo sapiens (human) long non-coding RNA OTTHUMT00000098311.1 (AC006335.14 gene) Homo sapiens G 


(GENCODE) > Vega transcript OTTHUMT00000098311 @ from gene OTTHUMG00000040955 G 
> Source ENA entry: HG512811.1:1..448:ncRNA@ 


> Y:6,357,219-6,361,413 Ensembl @ UCSC e 


(x) RefSeq Homo sapiens long intergenic non-protein coding RNA 280 (LINC00280), long non-coding RNA. Homo sapiens Z 
> RefSeq: NR_046505.1@% 
> NCBI GeneID: 100873964 7 
> HGNC gene LINC00280 7 


> Y:6,357,219-6,361,413 Ensembl UCSC e 
Genome browser Homo sapiens Y:6,357,219-6,361,413 (3) 


Homo sapiens (human) long non-coding RNA OTTHUMT00000098311.1 (AC006335.14 gene) 


E 2 qa B 


Transcripts 


RNAcentral 


Sequence (4) 


448 nucleotides (145 A; 99 C; 92 G; 112 U, 0 N) 


Powered by Genoverse Z 


ACAGGAAGAAAGUGGUGGAGUCAGAGGUCACAAUCCAAAGCAAGGUGACAGUCUCUUGACGAAGACACCCUGAGUGCUGAAUUAAGGGGCUAAAUAACAGUGCUAUAGUGGUUUCAUAAUCAUAAAGCAUUC 
CAGGAACACAAUAUUGCUUAUCAACUGAGCUUUAAAACAGAUAUGAGGCAGGACAUGGUGGCUCACACCUGCACUCCCAGCAAUUUGGCAAGCCAAGCUUUGAUGGCUCUGGCAGAAAGCCUUCCUGUUCCA 
AACUCACCACUUAUCACCAUAGUACCCGCGUUCAGGCAUUAAGAAGAAAAGAGCAUCCAAUCCCUC UGCUUUCCUGCAUCGGCUC CCAUGCACAAGCAAAUGUGUUGUGUUAGUGGCUAUCAARAUAUUCAUG 
UAUUUGAUCAAGGAAAUUUGUAUUUCCAAAUGGUAAAACAAAAAUACUACUA 


Figure 2. An RNAcentral entry web page for an IncR NA showing the four sections: (1) Overview and description of the RNA sequence, (2) Annotations 
and cross-references to Expert Databases, (3) Genome browser for mapped sequences, (4) Sequence data. 
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E Bacteria 
I Eukaryota 
@ Archaea 


@ Unclassified 
Other 
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Eukaryotes in detail 


E Vertebrates 

E Fungi 

I Viridiplantae 

W Metazoa (non 
vertebrates) 

E Other eukaryotes 


Figure 3. The species distribution of sequences in RNAcentral. 


variants of each RNA sequence. In addition, sequences de- 
rived from the Rfam Expert database again have different 5’ 
or 3’ ends from the experimentally characterized ends of the 
microRNA meaning that completely new URS sequences 
are created. 

The reference Escherichia coli strain K-12 substr. 
MGI1655 (taxid: 511145) contains 207 ncRNA genes ac- 
cording to EcoCyc (22). RNAcentral identifies 367 se- 
quences. The genome contains 7 ribosomal RNA oper- 
ons (23), and in RNAcentral we find 7 full-length LSU se- 
quences in addition to 14 shorter sequences that correspond 
to partial matches to the RNA. For SSU rRNA, we see only 
six sequences in RNAcentral. The discrepancy is explained 
by the fact that there are two copies of the SSU rRNA (rrnB 
and rrnE), which are identical in sequence and found in a 
single RNAcentral entry (URSOO000ABFE9). 


Release schedule 


The current release (1.0) follows a public beta release 
(1.0beta) in June 2014. In the future, the data will be up- 
dated several times a year, coinciding with major new ver- 
sions of the Expert Databases. The website user interface 
will be updated continuously. 


DISCUSSION 


RNAcentral is still at an early stage of development. The 
first release provides a stable accessioned set of RNA se- 
quences, along with sequence and metadata search, bulk 
download, cross-references and integrated genome brows- 
ing functionality. The final goal is to develop a resource 
akin to UniProt for ncRNAs, with rich functional anno- 
tation and identifiers for conceptual biological entities (in 
addition to those assigned to sequences). 

One challenge is that RNAcentral is entirely dependent 
on the quality of the input streams of data. For example, 
there are incorrect annotations of tRNA as rRNA com- 
ing from user submissions in ENA (e.g. JQ737315.1). The 
RNAcentral website enhances our ability to spot inconsis- 
tencies and we intend to provide automated solutions to re- 
fine the data to remove such obvious annotation errors. We 
will improve the provenance of sequences in RNAcentral to 
allow users to select slices of the data for either improved ac- 
curacy or improved coverage. In the current data scheme if 


an RNA sequence has even a single variant nucleotide (in- 
cluding an N), the two sequences will be given two different 
URS entries. This is far from ideal and a significant future 
effort will be placed on creating a new entity that groups all 
variants of the same ncRNA from a particular species. Fur- 
ther complications arise when identical RNA sequences are 
found at multiple genomic locations and so it will be impor- 
tant to also have an entity for an RNA gene that includes 
genomic location. 

At present we are far from covering all known ncRNA se- 
quences in RNAcentral. piR NAs, for example, are poorly 
represented in the database. This can be alleviated by in- 
cluding specialist databases such as piRNAbank in RNA- 
central. We have clear future plans to incorporate several 
more RNAcentral Expert databases: NONCODE, CRW, 
plncDB, tRNAdb, sR NAmap, snoRNAdb, SILVA, Green- 
Genes and tmRDB. However, not every type of RNA has 
its own specialist database, so in the longer term we plan to 
contact the authors of RNA discovery papers to encourage 
submission of sequence data to INSDC. 

A number of features for the database are planned for 
the coming year. These include mapping RNA sequences 
onto secondary and tertiary structure information. We will 
also incorporate the sequences of RNA from the structures 
in the PDB into RNAcentral, which will enable mapping of 
structural information to sequences in RNAcentral. We will 
expand mapping of RNAcentral sequences onto genomes, 
which provides a powerful way to understand contextual in- 
formation about the RNAs. 

We welcome all feedback and suggestions, which can be 
directed to us using the contact form on the RNAcentral 
website, as well as via GitHub and Twitter (the links are 
available at http://rnacentral.org). 
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